Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PWX-36477 : Updating Node PDB to allow k8s upgrades #1580

Merged
merged 10 commits into from
Jul 23, 2024

Conversation

svijaykumar-px
Copy link
Contributor

@svijaykumar-px svijaykumar-px commented Jun 17, 2024

What this PR does / why we need it: This PR takes care of updating the node PDBs for nodes that can be upgraded. First a list of cordoned nodes and nodes which have been selected for upgrade (PDB 0) are given as input to px api - FilterNonOverlappingNodes which gives an output of nodes that can be upgraded such that volume quorum is not lost. Then we check if user has provided any minAvailable value else we assume we have to maintain px quorum. If the provided value is invalid, we fall back to the quorum value. Then for every node in the response we change its PDB minAvailable value to 0 such that px quorum is maintained based on nodes that are already down and can go down.
In the case where user has disabled non disruptive upgrades, there is no call made to the api instead nodes are randomly selected for the upgrade based on minAvailable annotation provided by user. If this value is invalid, we only upgrade 1 node at a time.

Which issue(s) this PR fixes (optional)
Closes # PWX-36477, PWX-36478, PWX-36481, PWX-36504, PWX-36515

Special notes for your reviewer: UTs have been added for the cased but manual end to end testing is still pending

Copy link

codecov bot commented Jun 21, 2024

Codecov Report

Attention: Patch coverage is 74.86339% with 46 lines in your changes missing coverage. Please review.

Project coverage is 80.13%. Comparing base (ed8810d) to head (f223110).
Report is 1 commits behind head on PWX-25414-24.2.0.

Files Patch % Lines
...rs/storage/portworx/component/disruption_budget.go 72.82% 14 Missing and 11 partials ⚠️
pkg/util/k8s/k8s.go 0.00% 12 Missing ⚠️
drivers/storage/portworx/util/util.go 88.60% 5 Missing and 4 partials ⚠️
Additional details and impacted files
@@                 Coverage Diff                  @@
##           PWX-25414-24.2.0    #1580      +/-   ##
====================================================
- Coverage             80.20%   80.13%   -0.08%     
====================================================
  Files                    61       61              
  Lines                 18287    18464     +177     
====================================================
+ Hits                  14667    14796     +129     
- Misses                 2592     2624      +32     
- Partials               1028     1044      +16     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

This PR is stale because it has been in review for 3 days with no activity.

@svijaykumar-px svijaykumar-px requested review from nikita-bhatia and a team July 11, 2024 10:03
drivers/storage/portworx/component/disruption_budget.go Outdated Show resolved Hide resolved
drivers/storage/portworx/component/disruption_budget.go Outdated Show resolved Hide resolved
drivers/storage/portworx/util/util.go Outdated Show resolved Hide resolved
drivers/storage/portworx/component/disruption_budget.go Outdated Show resolved Hide resolved
drivers/storage/portworx/util/util.go Outdated Show resolved Hide resolved
drivers/storage/portworx/util/util.go Outdated Show resolved Hide resolved
drivers/storage/portworx/util/util.go Outdated Show resolved Hide resolved
drivers/storage/portworx/util/util.go Outdated Show resolved Hide resolved
drivers/storage/portworx/util/util.go Outdated Show resolved Hide resolved
drivers/storage/portworx/components_test.go Outdated Show resolved Hide resolved
drivers/storage/portworx/components_test.go Outdated Show resolved Hide resolved
drivers/storage/portworx/components_test.go Show resolved Hide resolved
@@ -13633,6 +13639,329 @@ func TestDeleteNodePodDisruptionBudget(t *testing.T) {

}

func TestUpdateNodePodDisruptionBudgetParallelEnabled(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are good set of UTs, but I don't see every condition being covered. Eg. Decommissioned nodes, px node status not being okay, non-quorum member nodes cordoned, etc.

Let's go through the code that you have added line by line and make sure we have a test case of each condition. One case even for an ORed and ANDed sub-condition.

Copy link
Contributor Author

@svijaykumar-px svijaykumar-px Jul 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added in UT cases for -

  • decommissioned node
  • px node status is down
  • non quorum member also cordoned
  • node without PDB cordoned
  • node version empty
  • When feature enabled, invalid minAvailable value, reconcile 2 times
  • when feature disabled, invalid minAvailable, reconcile 2 times

@svijaykumar-px
Copy link
Contributor Author

@piyush-nimbalkar I also wanted to check if I can use the function - NodesNeedingPDB instead of countStorageNodes() in updateMinAvailableForNodePDB. This is because the former function is simpler given all nodes are more than 3.1.2

Copy link
Contributor

@piyush-nimbalkar piyush-nimbalkar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will look at the UTs later and unblock this PR. You can go ahead and merge this PR and address the comments later if there are any.

@piyush-nimbalkar
Copy link
Contributor

@piyush-nimbalkar I also wanted to check if I can use the function - NodesNeedingPDB instead of countStorageNodes() in updateMinAvailableForNodePDB. This is because the former function is simpler given all nodes are more than 3.1.2

After looking at them, it doesn't look like they are exactly the same.

  • NodesNeedingPDB only looks at the PX nodes with corresponding k8s nodes, while CountStorageNodes will count the actual PX nodes. During an upgrade 2 PX nodes could have the same k8s node - the old that went down, and a new one that is coming up on the same k8s node.
  • NodesNeedingPDB ignores a cordoned k8s node, but we need to count that node in CountStorageNodes if a PX node is still running on it.

Does this make sense?

@svijaykumar-px svijaykumar-px merged commit c63bf13 into PWX-25414-24.2.0 Jul 23, 2024
6 checks passed
@piyush-nimbalkar piyush-nimbalkar deleted the PWX-36477 branch July 24, 2024 18:50
nikita-bhatia added a commit that referenced this pull request Aug 12, 2024
* [Cherry-pick into feature]: Cherry-pick of all tickets related to Smart and parallel upgrades merged into master (#1578)

* adding k8snode name label to px pods

* correcting go fmt error

* making requested changes

* Creating node PDB

* go fmt error

* Checking node version instead of cluster

* making requested changes

Conflicts:
	drivers/storage/portworx/component/disruption_budget.go

* aggregating errors

* adding logic to delete node PDB

* correcting go fmt errors

* made requested changes

* correcting after merge conflict

* [cherry-pick] PWX-36509 : StorageCluster schema changes to support parallel portworx upgrades (#1576)

* StorageCluster schema changes to support parallel portworx upgrades

Signed-off-by: hitesh-wani-px <[email protected]>

* change DisruptionSpec to Disruption and its description

Signed-off-by: hitesh-wani-px <[email protected]>

---------

Signed-off-by: hitesh-wani-px <[email protected]>

* PWX-36477 : Updating Node PDB to allow k8s upgrades (#1580)

* vendoring openstorage with nooverlappingnodes api

* adding logic to update node PDB

* Adding UTs and correcting logic

* go fmt error

* adding logic to delete cluster pdb after 3.1.2

* addressing comments requested

* skip checking node version if version is empty

* passing nodeIds instead of name to API

* making requested changes and adding UTs

* changing log to info

* PWX-36510: No disruption of volumes for portworx upgrades (#1612)

* non disruptive portworx upgrades

* adding new method to driver interface

* correcting UT

* goimport fmt

* gofmt file

* update PX version in test

* correcting new testcases

---------

Signed-off-by: hitesh-wani-px <[email protected]>
Co-authored-by: Swarupa Vijaykumar <[email protected]>
Co-authored-by: hitesh-wani-px <[email protected]>
Co-authored-by: svijaykumar-px <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants